我们提出了CX-TOM,简短于与理论的理论,一种新的可解释的AI(XAI)框架,用于解释深度卷积神经网络(CNN)制定的决定。与生成解释的XAI中的当前方法形成对比,我们将说明作为迭代通信过程,即对话框,机器和人类用户之间。更具体地说,我们的CX-TOM框架通过调解机器和人类用户的思想之间的差异,在对话中生成解释顺序。为此,我们使用思想理论(汤姆),帮助我们明确地建模人类的意图,通过人类的推断,通过机器推断出人类的思想。此外,大多数最先进的XAI框架提供了基于注意的(或热图)的解释。在我们的工作中,我们表明,这些注意力的解释不足以增加人类信任在潜在的CNN模型中。在CX-TOM中,我们使用命名为您定义的故障行的反事实解释:给定CNN分类模型M预测C_PRED的CNN分类模型M的输入图像I,错误线识别最小的语义级别特征(例如,斑马上的条纹,狗的耳朵),称为可解释的概念,需要从I添加或删除,以便将m的分类类别改变为另一个指定的c_alt。我们认为,由于CX-TOM解释的迭代,概念和反事本质,我们的框架对于专家和非专家用户来说是实用的,更加自然,以了解复杂的深度学习模式的内部运作。广泛的定量和定性实验验证了我们的假设,展示了我们的CX-TOM显着优于最先进的可解释的AI模型。
translated by 谷歌翻译
Recurrent neural networks (RNN) are the backbone of many text and speech applications. These architectures are typically made up of several computationally complex components such as; non-linear activation functions, normalization, bi-directional dependence and attention. In order to maintain good accuracy, these components are frequently run using full-precision floating-point computation, making them slow, inefficient and difficult to deploy on edge devices. In addition, the complex nature of these operations makes them challenging to quantize using standard quantization methods without a significant performance drop. We present a quantization-aware training method for obtaining a highly accurate integer-only recurrent neural network (iRNN). Our approach supports layer normalization, attention, and an adaptive piecewise linear (PWL) approximation of activation functions, to serve a wide range of state-of-the-art RNNs. The proposed method enables RNN-based language models to run on edge devices with $2\times$ improvement in runtime, and $4\times$ reduction in model size while maintaining similar accuracy as its full-precision counterpart.
translated by 谷歌翻译
Self-supervised learning via masked prediction pre-training (MPPT) has shown impressive performance on a range of speech-processing tasks. This paper proposes a method to bias self-supervised learning towards a specific task. The core idea is to slightly finetune the model that is used to obtain the target sequence. This leads to better performance and a substantial increase in training speed. Furthermore, this paper proposes a variant of MPPT that allows low-footprint streaming models to be trained effectively by computing the MPPT loss on masked and unmasked frames. These approaches are evaluated for automatic speech recognition on the Librispeech corpus, where 100 hours of data served as the labelled data and 860 hours as the unlabelled data. The biased training outperforms the unbiased training by 15.5% after 250k updates and 23.8% after 100k updates on test-other. For the streaming models, the pre-training approach yields a reduction in word error rate of 44.1%.
translated by 谷歌翻译